Skip to content

Conversation

dacorvo
Copy link
Collaborator

@dacorvo dacorvo commented Sep 26, 2025

What does this PR do?

This bumps the AWS Neuron SDK version to 2.26.

This also bumps the torch version to 2.8, which in turns leads to vLLM to be updated to 0.10.2 (the first version supporting pytorch 2.8).

There are some remaining errors in:

  • training tests.
FAILED tests/training/test_custom_modeling.py::test_custom_model_tie_weights - Failed: Test failed with SafetensorError: Error while deserializing header: incomplete metadata, file not fully covered
  • diffusers test
    Flux test hangs

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@dacorvo dacorvo force-pushed the neuron_sdk_2.26 branch 6 times, most recently from 4d3ca92 to 111d3ab Compare September 30, 2025 12:30
@dacorvo dacorvo force-pushed the neuron_sdk_2.26 branch 2 times, most recently from 9a0fd7c to f9e78c5 Compare October 1, 2025 10:06
@dacorvo dacorvo marked this pull request as ready for review October 1, 2025 10:07
uses: ./.github/actions/prepare_venv
- name: Install optimum-neuron
uses: ./.github/actions/install_optimum_neuron
- name: Install datasets dependencies
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps rephrase it to "Install audio tests dependencies"

@dacorvo dacorvo force-pushed the neuron_sdk_2.26 branch 2 times, most recently from 4e20238 to 8a9ac50 Compare October 1, 2025 13:48
This avoids importing docker and openai for all tests
Note that github variables used for inputs can only be of type string.
This is why the 'use_cuda' variable is not a boolean.
Being able to configure the pytorch installation allows a specific
workflow to install a specific torch version, or to use CUDA (some
packages are not compatible with pytorch CPU version).
neuronx-distributed is always required.
@dacorvo dacorvo force-pushed the neuron_sdk_2.26 branch 5 times, most recently from a34b5bb to 1238def Compare October 1, 2025 14:20
@dacorvo dacorvo force-pushed the neuron_sdk_2.26 branch 3 times, most recently from c098698 to aca8f74 Compare October 2, 2025 11:03
Some tests are failing with compel>=2.2.0
These tests hang with AWS Neuron SDK 2.26
Copy link
Collaborator

@tengomucho tengomucho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dacorvo dacorvo merged commit 5a01d92 into main Oct 2, 2025
12 checks passed
@dacorvo dacorvo deleted the neuron_sdk_2.26 branch October 2, 2025 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants